Skip to content

Conversation

@aojea
Copy link
Contributor

@aojea aojea commented Oct 17, 2025

apply network policies to existing connections.

The dataplane now inspect the existing connections in the conntrack table and evaluates against the current network policies. If a established connection is no longer allowed then the dataplane it sets the conntrack timeout to zero, so next packets of the connections are reevaluated.

Fixes: #246

TODO:

  • API design, flag with enum, multiple values
  • Do we apply it periodically or rate limited or just everytime that a network policy changes?

cc: @danwinship

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from thockin October 17, 2025 14:02
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2025
@aojea
Copy link
Contributor Author

aojea commented Oct 17, 2025

not working :/

@aojea aojea force-pushed the strict_mode branch 2 times, most recently from 1849ebd to f131351 Compare October 19, 2025 14:05
@aojea aojea added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2025
@aojea aojea force-pushed the strict_mode branch 7 times, most recently from 629963a to 9f9e727 Compare October 19, 2025 22:02
@aojea aojea removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2025
@aojea aojea requested a review from Copilot October 19, 2025 22:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Enables a "strict mode" that reevaluates existing conntrack flows against current NetworkPolicies and times out disallowed connections, ensuring policy changes apply to established connections.

  • Adds StrictMode flag and threads it through configuration/command entrypoints.
  • Implements a firewallEnforcer that scans conntrack entries and zeroes the timeout on disallowed flows.
  • Adds an e2e test for dropping established connections and adjusts logging.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/e2e_standard.bats Adds an E2E test for established-connection enforcement and tweaks install verbosity.
pkg/dataplane/controller.go Introduces StrictMode, a conntrack enforcement runner, and logging refinements.
pkg/dataplane/conntrack.go Adds PacketFromFlow utility to convert conntrack entries into evaluatable packets.
pkg/cmd/cmd.go Adds --strict-mode flag, defaulting to true.
cmd/kube-network-policies/*/main.go Wires StrictMode from flags into dataplane config.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@aojea aojea force-pushed the strict_mode branch 5 times, most recently from 6787ff6 to 95e38d4 Compare October 20, 2025 09:47
@aojea
Copy link
Contributor Author

aojea commented Oct 20, 2025

@danwinship this is ready

@aojea
Copy link
Contributor Author

aojea commented Oct 21, 2025

/hold

Florian (netlink) explained this behavior is racy https://patchwork.ozlabs.org/project/netfilter-devel/patch/[email protected]/

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2025
@aojea aojea force-pushed the strict_mode branch 3 times, most recently from 29d9f63 to cd9a667 Compare October 21, 2025 10:45
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2025
@aojea
Copy link
Contributor Author

aojea commented Oct 21, 2025

Ok, this is finally ready, I just had to go into a big rabbit hole with the ct labels but is the most elegant way to solve it

Pending of two fixes in the libraries and to split in commits, but this is the final PR and we go for v1

vishvananda/netlink#1135
florianl/go-nfqueue#70

@aojea aojea force-pushed the strict_mode branch 3 times, most recently from 40ff1c4 to 2a125c3 Compare October 22, 2025 05:16
bitPos := uint(bitIndex % 64)
mask := uint64(1) << bitPos

result := make([]byte, 16)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do this at the top and then you'd be able to return result in the error case.
(EDIT: like in clearLabelBit!)

} else {
// Bit is in the LSW (0-63), serialize the mask into the last 8 bytes.
binary.BigEndian.PutUint64(result[8:], mask)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use uint64s here...

byteIndex := bitIndex / 8
bitPos := uint(bitIndex % 8)
mask := uint8(1) << bitPos
result[byteIndex] = mask

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that gives me the returned value in the reverse order, I think is the oppsite "endianess"

 /usr/local/google/home/aojea/src/kube-network-policies/pkg/dataplane/conntrack_test.go:76: generateLabelMask() for index 10:
         Got:  00040000000000000000000000000000
         Want: 00000000000000000000000000000400
=== RUN   TestGenerateLabelMask/Bit_126_(MSW)
    /usr/local/google/home/aojea/src/kube-network-policies/pkg/dataplane/conntrack_test.go:76: generateLabelMask() for index 126:
         Got:  00000000000000000000000000000040
         Want: 40000000000000000000000000000000

newWord := currentWord & zeroMask

// 5. Write the modified 64-bit word back into the new array.
binary.BigEndian.PutUint64(newLabel[targetStartIndex:], newWord)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byteIndex := bitIndex / 8
bitPos := uint(bitIndex % 8)
mask := uint8(1) << bitPos
newLabel[byteIndex] ^&= mask

NetfilterBug1766Fix bool
NFTableName string // if other projects use this controllers they need to be able to use their own table name
StrictMode bool // enforce network policies also on established connections
CTLabelAccept int // conntrack label to set on accepted connections (between 1-127)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't add a command line option for CTLabelAccept

Copy link
Contributor Author

@aojea aojea Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to reduce very much what is exposed to end users ... this is what is exposed to the library consumers, I prefer to expose to users in a case per case basis, the less visible flags the better and a signal we are doing things in the right direction


if c.evaluatePacket(ctx, &packet) {
verdict = nfqueue.NfAccept
err := c.nfq.SetVerdictWithLabel(*a.PacketID, verdict, generateLabelMask(c.config.CTLabelAccept))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... we should use existingLabel | CTLabelAccept right? Otherwise we're breaking other usage of ct labels on the host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, but I do not know if nfqueue gives us that information or if this will merge or remove existing labels ... so the alternative is to do a netlink call pero packet for getting the labels ... that does not sound great ...

Based on the last days adventures I really doubt there is much usage of ct labels so I think we can document this as a known issue and work on this once we get more feedback ...

}
} else {
verdict = nfqueue.NfDrop
err := c.nfq.SetVerdict(*a.PacketID, verdict)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your patch for nfq lets you pass nil to SetVerdictWithLabel to do nothing with the label, right? So this could be simplified a little...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have much confidence in netlink with undocumented behaviors, so better to be safer

&expr.Cmp{Op: expr.CmpOpEq, Register: 0x1, Data: []byte{unix.IPPROTO_UDP}},
&expr.Payload{DestRegister: 0x1, Base: expr.PayloadBaseTransportHeader, Offset: 2, Len: 2},
&expr.Cmp{Op: expr.CmpOpEq, Register: 0x1, Data: binaryutil.BigEndian.PutUint16(53)},
&expr.Counter{},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly related; if you're planning to keep this, split it out into a separate commit?

(but also, note that the reason that counters aren't automatic for nftables rules like they were with iptables is because there are performance implications (eg, cross-cpu cache invalidation or something).)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them for debugging and then I realized that are very useful, maybe adding counters for each line is excessive ... any suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just left the ones that detect the packets sent by skuid 0, usually kubelet and host processes and the one that matches the conntrack established state

&expr.Ct{Register: 0x1, SourceRegister: true, Key: expr.CtKeyLABELS},
&expr.Counter{},
},
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this rule translate to? Does it have any effect at all besides tricking netfilter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can see the output in pkg/dataplane/controller_test.go

		ct label 28 ct state established,related counter packets 0 bytes 0 accept
		ip saddr @podips-v4 counter packets 0 bytes 0 queue flags bypass to 102
		ip daddr @podips-v4 counter packets 0 bytes 0 queue flags bypass to 102
		ip6 saddr @podips-v6 counter packets 0 bytes 0 queue flags bypass to 102
		ip6 daddr @podips-v6 counter packets 0 bytes 0 queue flags bypass to 102
		ct label set 28 counter packets 0 bytes 0. <<<<<<<

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment calls this "fake", but doesn't that just say "set the label on every packet that doesn't get nfqueue'd"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment, the entry is only reached if the queue is bypassed, otherwise the queue verdict will terminate the chain evaluation

aojea added 3 commits October 22, 2025 20:05
The dataplane uses conntrack labels to identify the connections that are
already processed and skip the ones that are processed and established.
The dataplane now inspect the existing connections in the conntrack
table and evaluates against the current network policies, if one of the
connections is no longer valid the label is removed, so the packets gets
requeued and reevaluated.

The strict mode is enabled by default and runs at most every 30 seconds
once there is a change triggered in the dataplane, this is to avoid
performance issues for listing conntrack entries too often.
@aojea
Copy link
Contributor Author

aojea commented Oct 22, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

re-evaluate active connections when policies change

3 participants